Dislocation Substructures Evolution and an Informer Constitutive Model for a Ti-55511 Alloy in Two-Stages High-Temperature Forming with Variant Strain Rates in β Region

The high-temperature compression characteristics of a Ti-55511 alloy are explored through adopting two-stage high-temperature compressed experiments with step-like strain rates. The evolving features of dislocation substructures over hot, compressed parameters are revealed by transmission electron microscopy (TEM). The experiment results suggest that the dislocations annihilation through the rearrangement/interaction of dislocations is aggravated with the increase in forming temperature. Notwithstanding, the generation/interlacing of dislocations exhibit an enhanced trend with the increase in strain in the first stage of forming, or in strain rates at first/second stages of a high-temperature compressed process. According to the testing data, an Informer deep learning model is proposed for reconstructing the stress–strain behavior of the researched Ti-55511 alloy. The input series of the established Informer deep learning model are compression parameters (compressed temperature, strain, as well as strain rate), and the output series are true stresses. The optimal input batch size and sequence length are 64 and 2, respectively. Eventually, the predicted results of the proposed Informer deep learning model are more accordant with the tested true stresses compared to those of the previously established physical mechanism model, demonstrating that the Informer deep learning model enjoys an outstanding forecasted capability for precisely reconstructing the high-temperature compressed features of the Ti-55511 alloy.


Introduction
Due to its outstanding properties consisting of mechanical properties, anticorrosive performance, and thermal treatability, near-β titanium alloy is comprehensively applied in the crucial manufacture of load-bearing aircraft components [1,2]. Usually, hot deformation is necessarily utilized to improve the microstructures and further optimize the practical performance of titanium alloys [3][4][5]. The coupling effects of multiple forming parameters induce intricate evolving characteristics of microstructures and high-temperature flow behavior of titanium alloys [6][7][8][9][10]. Hence, investigations on the microstructural evolution and accurately modeling the true stress-strain characteristics of titanium alloys are significant.
To this day, numerous investigations have been devoted to exploring the microstructural evolution mechanisms of titanium alloys [11][12][13][14][15]. Some reports [16,17] revealed the substructural evolving features for multiple titanium alloys in thermal forming and detected that the substructural nucleated/migration mechanisms were substantially affected by processing parameters. Meanwhile, it was found that the evolution of substructures could exert a prominent effect on the nucleated/coarsening of dynamic recrystallization

Experimental Material and Procedure
The commercial near-β titanium alloy was employed in the present investigation. The chemical composition (wt.%) for the researched titanium alloy was 5.16Al-4.92Mo-4.96V-1.10Cr-0.98Fe-(bal.) Ti. Cylindrical specimens (Φ8 mm × 12 mm) for thermal compression were manufactured. The Gleeble-3500 device was employed for constructing the two-stage thermally compressed experiments. Figure 1 reveals the explicit experimental procedures. Distinctly, all forming processes contain two compressed stages (I as well as II). The compressed temperature (T) and the total strain (ε total ) were consistent in two stages. Here, three compressed temperatures (890 • C, 920 • C as well as 950 • C) and the constant value of ε total (1.2) were adopted. Still, discrepant strain rates were exploited in each compressed stage. The representative complete compressed experimental step is that the specimen was thermally compressed under the strain rate of the first compression stage ( . ε I ) until the strain of stage I (ε I ) was finished, and then thermal compression was executed under the dissect the evolving features of substructures in thermal compression, transmissio tron microscopy (TEM) was adopted. To dissect the original microstructure, e backscatter microscopy (EBSD) was chosen. For analyzing using TEM as well as the thermally compressed samples were axially machined for acquiring cross-se Afterwards, these sections were ground, polished, and etched in a solution (10 mL + 70 mL C4H10O + 120 mL CH3OH). Figure

High-Temperature Compression Features and Substructural Evolution
The prime hot flow features of the researched titanium alloy in double-stage ho pression with stepped-strain rates are displayed in Figure 3. Clearly, the high-temp compression behaviors are markedly affected by compression parameters. As reve Before thermal compression, each sample was heated to the compressed temperatures under 10 • C/s and remained at 300 s. When the thermal compressed process was finished, the compressed blocks were directly cooled utilizing water (about 25 • C). To dissect the evolving features of substructures in thermal compression, transmission electron microscopy (TEM) was adopted. To dissect the original microstructure, electron backscatter microscopy (EBSD) was chosen. For analyzing using TEM as well as EBSD, the thermally compressed samples were axially machined for acquiring cross-sections. Afterwards, these sections were ground, polished, and etched in a solution (10 mL HClO 4 + 70 mL C 4 H 10 O + 120 mL CH 3 OH). Figure 2 displays the original grain structures, and most of the initial β grains are equiaxed grains. stage ( I ε  ) until the strain of stage I ( I ε ) was finished, and then thermal co executed under the strain rate of the second compression stage ( II ε  ). Co three values of I ε (0.3, 0.6, as well as 0.9) were adopted.
Before thermal compression, each sample was heated to the compre tures under 10 o C /s and remained at 300 s. When the thermal compresse finished, the compressed blocks were directly cooled utilizing water (abo dissect the evolving features of substructures in thermal compression, tran tron microscopy (TEM) was adopted. To dissect the original microstru backscatter microscopy (EBSD) was chosen. For analyzing using TEM as the thermally compressed samples were axially machined for acquiring Afterwards, these sections were ground, polished, and etched in a solution + 70 mL C4H10O + 120 mL CH3OH). Figure 2 displays the original grain s most of the initial β grains are equiaxed grains.

High-Temperature Compression Features and Substructural Evolution
The prime hot flow features of the researched titanium alloy in doublepression with stepped-strain rates are displayed in Figure 3. Clearly, the hig compression behaviors are markedly affected by compression parameters.

High-Temperature Compression Features and Substructural Evolution
The prime hot flow features of the researched titanium alloy in double-stage hot compression with stepped-strain rates are displayed in Figure 3. Clearly, the high-temperature compression behaviors are markedly affected by compression parameters. As revealed in Figure 3a, the true stresses at the first and second stages of hot compression exhibit a diminishing tendency with rising compression temperature. One principal reason for this experimental result is that the DRX behavior dramatically proceeds as the compressed temperature (T) ascends [6]. Moreover, the visible evolution of substructures occurs with the elevated compression temperature, as depicted in Figure 4a,b. For the compressed temperature of 920 • C and strain rate of 0.01 s −1 , the formation of high-density dislocation clusters can be detected ( Figure 4a). Then, the prominent work-hardening (WH) effect is inspired owing to the acute interaction of adjacent substructures, and the rise in true stress occurs quickly [6,16]. When the compressed temperature is elevated from 920 • C to 950 • C, the intensive migration/interaction of dislocations and grain boundary occurs, and the substructures are apparently consumed (Figure 4b). Then, the reinforced dynamic softening feature emerges with a rising incompression temperature, and a decrease in true stress appears. Furthermore, the true stress at the second stage of high-temperature compression exhibits a relative increasing trend along with the rise in the strain of the first-stage compression (ε I ), as displayed in Figure 3b. This tested result is primarily ascribed to the weakened DRX development occurring at large values of ε I , as the strain rate is transferred from a high value ( . ε I = 0.1 s −1 ) to a low value ( . ε II = 0.001 s −1 ) [6]. Meanwhile, the variations of ε I exerting a significant influence on the substructural evolution are depicted in Figure 4c. From Figure 4a,c, it can be detected that the generation/accumulation of substructures (subgrain, dislocation network, etc.) is promoted with increasing ε I . Owing to the formation of high-density dislocation networks, the resistance of dislocation slippage, and grain boundary motion is raised, inducing the rise in true stress at the second stage of high-temperature compression.
compression with stepped-strain rates are displayed in Figure 3. Clearly, the temperature compression behaviors are markedly affected by compression param As revealed in Figure 3a, the true stresses at the first and second stages of hot comp exhibit a diminishing tendency with rising compression temperature. One pr reason for this experimental result is that the DRX behavior dramatically proceeds compressed temperature (T) ascends [6]. Moreover, the visible evolution of substru occurs with the elevated compression temperature, as depicted in Figure 4a,b. F compressed temperature of 920 °C and strain rate of 0.01 s -1 , the formation of high-d dislocation clusters can be detected ( Figure 4a). Then, the prominent work-har (WH) effect is inspired owing to the acute interaction of adjacent substructures, a rise in true stress occurs quickly [6,16]. When the compressed temperature is e from 920 °C to 950 °C, the intensive migration/interaction of dislocations and boundary occurs, and the substructures are apparently consumed (Figure 4b). Th reinforced dynamic softening feature emerges with a rising incompression tempe and a decrease in true stress appears. Furthermore, the true stress at the second s high-temperature compression exhibits a relative increasing trend along with the the strain of the first-stage compression ( I  ), as displayed in Figure 3b. This tested is primarily ascribed to the weakened DRX development occurring at large values as the strain rate is transferred from a high value (

The Informer Deep Learning Model for Forecasting Hot Flow Features Alloy
In contrast to existing models with lengthy process limitations, the model demonstrates the operational potential for long sequence prediction innovative architecture and self-attention mechanism [56]. Although the c attention mechanism is capable of processing large-scale data with impre mance, the high computational complexity and significant memory consump ing layers of the model impede its practical application. To address such a d timized models such as the LogSparse Transformer model [57] and simila were proposed to reduce the original self-attention mechanism complexity, ciency remained limited. Moreover, the Reformer model was embedded wit sitive hashing updated self-attention to reduce the complexity in the excep term series for each layer [59]. In certain situations, the complexity growth r former model was optimized to be linear, but the model could potentially ex radation in practical long-term prediction [60]. More recently, a continuous tion mechanism was deployed in the Infinite Memory Transformer model to plexity from input length, but the prediction accuracy was decreased [61].
In summary, previous Transformer models focused on optimizing the c the attention mechanism for each layer and obtained important findings. H ultaneously cutting down the complexity and breaking the scalability bottle

The Informer Deep Learning Model for Forecasting Hot Flow Features of a Ti-55511 Alloy
In contrast to existing models with lengthy process limitations, the Transformer model demonstrates the operational potential for long sequence prediction, owing to its innovative architecture and self-attention mechanism [56]. Although the canonical self-attention mechanism is capable of processing large-scale data with impressive performance, the high computational complexity and significant memory consumption in stacking layers of the model impede its practical application. To address such a deficiency, optimized models such as the LogSparse Transformer model [57] and similar models [58] were proposed to reduce the original self-attention mechanism complexity, but their efficiency remained limited. Moreover, the Reformer model was embedded with locally sensitive hashing updated self-attention to reduce the complexity in the exceptionally long-term series for each layer [59]. In certain situations, the complexity growth rate of the Informer model was optimized to be linear, but the model could potentially experience degradation in practical long-term prediction [60]. More recently, a continuous-space attention mechanism was deployed in the Infinite Memory Transformer model to free the complexity from input length, but the prediction accuracy was decreased [61].
In summary, previous Transformer models focused on optimizing the complexity of the attention mechanism for each layer and obtained important findings. However, simultaneously cutting down the complexity and breaking the scalability bottleneck of stacking layers is rarely addressed. Therefore, the Informer deep learning model is proposed to address these limitations and accelerate its computing speed [54]. In the present research, the Informer deep learning model is applied as a practical method for forecasting the flow characteristics of the studied titanium alloy. Specifically, the Informer deep learning model leverages the proposed ProbSparse self-attention mechanism and distilling operation to reduce the memory usage and time complexity of the dependency alignment to O(L log L) and the space complexity to O((2 − ε)L log L). During the inference phase, the model utilizes a generative decoder form to avoid cumulative error spreading and optimize long-series output. The Informer deep learning model architecture is shown in Figure 5.

ProbSparse Self-Attention Mechanism
With inputs as query, key and value, the original self-attention mechanism is defined as [56], , and d denotes the input dimension. Derived by [62], the i -th query's attention can be defined with kernel smoothing as, is conducted to obtain the probability, which entails a large ( ) Q K L L  memory usage. Therefore, the Informer deep learning model proposed the query sparsity measurement to tackle this major defect of self-attention.
The similarity between p and q can be used to distinguish the importance, which can be conducted through Kullback-Leibler divergence as, ( ) The measurement of the i -th query is defined by dropping the constant as,

ProbSparse Self-Attention Mechanism
With inputs as query, key and value, the original self-attention mechanism is defined as [56], where Q ∈ R L Q ×d , K ∈ R L K ×d , V ∈ R L V ×d , and d denotes the input dimension. Derived by [62], the i-th query's attention can be defined with kernel smoothing as, where q i , k i , v i stand for the i-th row in Q, K, V, respectively, and k q i , k j = exp q i k j / √ d .
The part p k j |q i = k q i , k j / ∑ l k(q i , k l ) is conducted to obtain the probability, which entails a large O L Q L K memory usage. Therefore, the Informer deep learning model proposed the query sparsity measurement to tackle this major defect of self-attention. The similarity between p and q can be used to distinguish the importance, which can be conducted through Kullback-Leibler divergence as, The measurement of the i-th query is defined by dropping the constant as, where the formula calculates the Log-Sum-Exp (LSE) and the arithmetic mean of all keys [63]. If M(q i , K) grows larger, the probability p becomes more principal factor alterable, thus having a superior differentiating capability. According to the above measurement, the ProbSparse self-attention mechanism can be further conducted by distributing keys to Top-u queries as where Q is the q-size sparse matrix. When u = c · ln L Q , the layer memory usage is reduced to O L K ln L Q due to the lessened calculation for each key. Nevertheless, the query sparsity measurement needs quadratic O L Q L K calculation, and the LSE implement is not constantly numerically stable. Hence, an empirical approximation is conducted.
For each q i , the discrete keys can be converted to continuous ones as vector k j . In addition, the first term of the M(q i , K) becomes the LSE of the inner product of a fixed query q i and all the keys, and define From the Log-Sum-Exp network and relative studies [63,64], the convex function f i (K) combines linear k j for q i , making M(q i , K) convex. Hence, the measurement can be conducted to a derivation form with each vector k j as follows, Let → ∇M(q i ) = → 0 to reach the minimum value; the condition can be listed as, The minimum value ln L K can be obtained when k 1 = k 2 = · · · = k L K . Therefore, the measurement can be written as Hence, by picking the largest inner-product max j q i k j / √ d , the inequation can be derived as Eventually, by combining the above equations, the bound can be denoted as where q i ∈ R d and k j ∈ R d are in the keys set K. From the above deductions, the max-mean measurement can be defined as Specifically, a long-tail distribution pattern of the self-attention mechanism was observed by performing a qualitative assessment [54]. In this case, only a few dot product pairs contribute to the major attention. Hence, M(q i , K) only requires U = L K ln L Q dot product pairs of random sampling, and the remaining pairs are filled with zero values. Therefore, the operation has a weaker sensitivity and remains numerically stable. Eventually, in practical application, the relatively equivalent input length L Q = L K = L in self-attention computation can reduce the complexity to O(L ln L).

Encoder
The Informer deep learning model utilizes the encoder architecture to extract the long-term dependency of input series, where the t-th input X t is reshaped as matrix X t en ∈ R LX×dmodel [56]. The encoder is composed of multiple identical layers stacked on top of each other. Specifically, the architecture of a single stack in the encoder of the Informer deep learning model is given in Figure 6.

Encoder
The Informer deep learning model utilizes the encoder architecture to extract the long-term dependency of input series, where the t-th input t X is reshaped as matrix model t en X L d X × ∈  [56]. The encoder is composed of multiple identical layers stacked on top of each other. Specifically, the architecture of a single stack in the encoder of the Informer deep learning model is given in Figure 6. Due to the processing of the ProbSparse self-attention mechanism, the encoder is loaded with redundant value V combinations. Hence, self-attention distilling is proposed to concentrate self-attention mechanisms for the next layer.
Based on the dilated convolution [65], the distilling operation feeds forwards the ( ) 1 j + -th layer as,
The max-pooling layer is added to reduce the total memory usage to . Furthermore, a pyramid-like processing structure (shown in Figure 6) is established where inputs are halved to serve as the replication of the main stack and the distilling layers drop gradually. In this case, the operation has a better robustness, and the resulting dimensions of different layers are consistent. Due to the processing of the ProbSparse self-attention mechanism, the encoder is loaded with redundant value V combinations. Hence, self-attention distilling is proposed to concentrate self-attention mechanisms for the next layer.
Based on the dilated convolution [65], the distilling operation feeds forwards the (j + 1)-th layer as, where [·] AB denotes the attention block, and Conv1d(·) generates a 1D convolutional filter with ELU(·) activation function [66]. The max-pooling layer is added to reduce the total memory usage to O((2 − ε)L log L). Furthermore, a pyramid-like processing structure (shown in Figure 6) is established where inputs are halved to serve as the replication of the main stack and the distilling layers drop gradually. In this case, the operation has a better robustness, and the resulting dimensions of different layers are consistent.

Decoder
The canonical decoder structure is optimized with generative inference to mitigate the long-term speed descent. The decoder mechanism is defined as where X t token ∈ R L token ×d model is the start token, and X t 0 ∈ R L y ×d model is the placeholder for target sequences.
Extended from dynamic decoding [67], the procedure is innovated to sample a L token series in the input sequence as a start token then feed it to the decoder as X de = {X L , X 0 }. Afterwards, the decoder obtains outputs through a single forward procedure, and thus it can process with less time consumption than a normal encoder-decoder architecture.

Identification for the Parameters of the Informer Deep Learning Model
The inputs of the Informer deep learning model are temperature T = {890, 920, 950} • C, true strain ε = {0~1}, and strain rate . ε = {0.001, 0.01, 0.1, 1} s −1 . The input sequences were preprocessed by concatenating experimental data of true stress values under different temperatures, true strains, and strain rates. The corresponding temperature, true strain, and strain rate values were also concatenated in the sequences. Then, these sequences were applied as training inputs. The experimental data are shuffled using 7/10 of the total amount for training and the rest for testing and validating the model.
As discussed above regarding the architecture of the Informer deep learning model in Sections 4.1-4.3 and the features of general deep neural networks, the Informer deep learning model should first be established by tuning hyper-parameters such as learning rate, input batch size, dropout, etc. To obtain the optimal parameters, the correction coefficient R, average absolute relative error AARE, mean squared error MSE, and root-mean squared error RMSE assessment criteria are employed for evaluating the results.
where N notes the total amount of result data, and M i and P i stand for the measured and predicted results when • M and • P are the mean values, respectively. Generally, the accuracy and generalization ability of deep learning models are affected by various hyper-parameters. In the case of forecasting, the batch size of input sequences and the initial learning rate of the model play crucial roles. On the one hand, a larger batch size allows faster training but may result in worse model accuracy and an unstable training process [68]. On the other hand, a smaller batch size is beneficial for generalization but can lead to a longer computation time [69]. Additionally, both theoretical and empirical evidence have proven that the batch size and learning rate significantly impact the generalization ability and accuracy of the deep learning model [70][71][72]. To further explore the relationship between the two parameters and the results, experimental curves are displayed in Figure 7. The five curves represent the effect of the learning rate on validation loss under different batch sizes. Specifically, the learning rate is tested in a uniformly spaced range from 10 −1 to 10 −6 with batch sizes of 8, 16, 32, 64, and 128, respectively. The model accuracy is evaluated by the validation loss.
It is clear that the validation loss of the Informer deep learning model drops to a minimum value and then starts to fluctuate when the learning rate increases from 10 −4 to 2 × 10 −3 . As the learning rate further ascends, the fluctuation of validation loss becomes intense, and thus the optimal learning rate can be chosen as 1.2929 × 10 −3 . Specifically, the curve fluctuations in Figure 7 demonstrate an appropriate balance of model accuracy and training stability under the batch size of 64. Hence, the batch size is determined as 64. curves are displayed in Figure 7. The five curves represent th on validation loss under different batch sizes. Specifically, th uniformly spaced range from 10 −1 to 10 −6 with batch sizes of 8 tively. The model accuracy is evaluated by the validation loss It is clear that the validation loss of the Informer deep minimum value and then starts to fluctuate when the learnin 2 × 10 −3 . As the learning rate further ascends, the fluctuation intense, and thus the optimal learning rate can be chosen as 1 curve fluctuations in Figure 7 demonstrate an appropriate bal training stability under the batch size of 64. Hence, the batch s In addition, it is important to note that the parameters o length also have a significant impact on accuracy. Based on e timal sequence length and label length are identified as two a Eventually, the values of R, AARE, and RMRE can be c and 2.2016, respectively. According to the results, the perform learning model is shown in Figure 8. It illustrates good cons mental data and the modeled results, demonstrating the grea deep learning model to describe the high-temperature defo searched titanium alloy. In addition, it is important to note that the parameters of sequence length and label length also have a significant impact on accuracy. Based on experimental results, the optimal sequence length and label length are identified as two and one, respectively.
Eventually, the values of R, AARE, and RMRE can be computed as 0.9986, 4.191%, and 2.2016, respectively. According to the results, the performance of the Informer deep learning model is shown in Figure 8. It illustrates good consistency between the experimental data and the modeled results, demonstrating the great capability of the Informer deep learning model to describe the high-temperature deformation features of the researched titanium alloy. It is clear that the validation loss of the Informer dee minimum value and then starts to fluctuate when the learn 2 × 10 −3 . As the learning rate further ascends, the fluctuatio intense, and thus the optimal learning rate can be chosen as curve fluctuations in Figure 7 demonstrate an appropriate b training stability under the batch size of 64. Hence, the batc In addition, it is important to note that the parameters length also have a significant impact on accuracy. Based on timal sequence length and label length are identified as two Eventually, the values of R, AARE, and RMRE can be and 2.2016, respectively. According to the results, the perfo learning model is shown in Figure 8. It illustrates good con mental data and the modeled results, demonstrating the gr deep learning model to describe the high-temperature de searched titanium alloy.

Comparisons and Discussion
As shown in the above sections, the Informer deep learning model exhibits a strong forecasting ability for the true stress of the researched titanium alloy. According to the author's previous investigation [6], a physical mechanism (PM) model was constructed for forecasting the true stress of the researched titanium alloy, i.e., where σ is the flow stress, σ y is the short-range component, and σ ρ is the dislocation interaction stress.
. ε is the strain rate, R is the gas constant, T is the absolute temperature, ρ i is the dislocation density, . ρ i + is the dislocation density emergence rate under WH, and . ρ i DRV and . ρ i DRX are dislocation density variation rate of DRV and DRX, respectively. Λ is the mean-free path of dislocation, and d i is the average grain size. X is the DRX fraction and the rate . X. M b is the grain boundary movement rate. P is the driving force. D ob is the factor of self-diffusion, and δ is the grain boundary thickness. Figure 9 unveils the comparative analysis of forecasting performances between the PM and Informer deep learning model. Compared to that of the PM model, the Informer deep learning model enjoys a smaller forecasting error of true stresses, particularly for the researched titanium alloy at lower compressed temperature (890 • C) or higher To validate the forecasting capability, the correlation results of forecasted true stresses and tested ones are plotted in Figure 10. Clearly, the scatters of the PM constitutive model are more dispersed, while those of the Informer deep learning model are more centralized. Meanwhile, the values of R, AARE, and RMSE are determined, as noted in Table 1. Distinctly, the relative larger R as well as the smaller AARE and RMSE values imply that the established Informer deep learning model can accurately depict the hot compressed features of the Ti-55511 alloy.
Meanwhile, the values of R, AARE, and RMSE are determined, as noted in Table 1 tinctly, the relative larger R as well as the smaller AARE and RMSE values imply tha established Informer deep learning model can accurately depict the hot compressed tures of the Ti-55511 alloy.

Conclusions
The evolving characteristics of microstructures as well as flow behavior for a Ti-55511 alloy in two-stage thermal compression experiments with step-like strain rates are researched. The decisive conclusions are drawn as: (1) In high-temperature compression, the influences of forming parameters on the flow behaviors of the researched Ti-55511 alloy are significant. Flow stresses are reduced with the increase in compressed temperature. Notwithstanding, flow stresses at stage II of thermal compression display an increase trend with the descent of ε I or increase in . ε I / . ε II ; (2) The formation of high-density networks/clusters through dislocation concentration/interaction is suppressed with the increase in compressed temperature. Nevertheless, the dislocation nucleation/concentration is enhanced with the increase in ε I ; (3) The Informer deep learning model is developed to reconstruct the thermal compressed characteristics of the researched Ti-55511 alloy. The considerable agreement between the predicted true stresses and experimental results demonstrates the high prediction accuracy of the Informer deep learning model.